Statistics in Medicine — Latest Matching Preprints

1

Simulation-Based Comparison of ControlledInterrupted Time Series (CITS) and Multivariable Regression

ORWA, F. O.; Mutai, C.; Nizeyimana, I.; Mwangi, A.

2026-04-13 health policy 10.64898/2026.04.10.26350670 medRxiv

Top 0.1%

10.2%

Show abstract

When randomized controlled trials are impractical, interrupted time series designs offer a rigorous quasi-experimental approach to assess population level policies. Indeed, in the context of quasi-experimental designs (QEDs), the Interrupted Time Series (ITS) method is commonly thought of as the most robust. But interrupted time series designs are susceptible to serial correlation and confounding by time-varying factors associated with both the intervention and the outcome, which may result in biased inference. Thus, we provide a simulation-based contrast of controlled interrupted time series (CITS) and multivariable regression (multivariable negative binomial regression) for estimation of policy effects in count time series data. These approaches are widely used in policy evaluations, yet their comparative performance in typical population health settings has rarely been examined directly. We tested both approaches within a variety of data generating situations, differing in the series length, intervention effect size, and magnitude of lag-1 autocorrelation. Bias, standard error calibration, confidence interval coverage, mean squared error, and statistical power were assessed for performance. Both methods gave unbiased estimates for moderate and large intervention effects, although bias was more pronounced for small effects, particularly in short series. Although the point estimate performance was similar, inferential properties varied significantly. CITS always had smaller mean squared error, better consistency between model based and empirical standard errors, and confidence interval coverage near the 95% nominal levels over weak to moderate autocorrelation. By contrast, multivariable regression was more sensitive to serial dependence, leading to underestimated standard errors and undercoverage, especially at moderate to high autocorrelation, regardless of Newey-West adjustments. These findings show the benefits of using a concurrent control series and the importance of structurally accounting for serial correlation when studying population level policies with time series data.

2

Regression-based Modeling of Spearman's Rho for Longitudinal Metabolomics and Mental Wellness in Breast Cancer Patients

Chen, Y.; Gui, T.; Huang, Z.; Quach, N.; Tu, S.; Liu, J.; Garrett, T. J.; Starkweather, A. R.; Lyon, D. E.; Shepherd, B. E.; Tu, X. M.; Lin, T.

2026-04-16 cancer biology 10.64898/2026.04.13.718341 medRxiv

Top 0.1%

9.2%

Show abstract

SO_SCPLOWUMMARYC_SCPLOWChemotherapy in breast cancer (BC) can substantially affect mental wellness. Advances in metabolomics enable comprehensive profiling of metabolic changes over time during and after treatment, offering insights into biological mechanisms linking chemotherapy to mental health outcomes. To study the association between metabolite profiles and mental wellness, correlation-based analyses are particularly useful. Spearmans rho is a widely used correlation measure and popular alternative to Pearsons correlation, since it also applies to non-linear association between variables. However, existing methods are not designed for longitudinal data and do not allow for covariate adjustments. In this paper, we propose a novel regression-based framework grounded in a class of semiparametric models, the functional response models, to extend this popular correlation measure to longitudinal settings with missing data under the missing at random assumption. This framework facilitates inferences about temporal changes in correlations over time and association of explanatory variables for such changes. We use simulation studies to evaluate performance of the approach with moderate sample sizes. We apply the approach to a one-year longitudinal substudy of the EPIGEN study to examine the longitudinal association between metabolite profiles and mental wellness in BC patients undergoing chemotherapy. The identified metabolites may serve as candidates for future in-depth bioinformatics analyses and translational investigations.

3

Causal estimands and target trials for the effect of lag time to treatment of cancer patients

Goncalves, B. P.; Franco, E. L.

2026-04-08 epidemiology 10.64898/2026.04.07.26350338 medRxiv

Top 0.1%

4.8%

Show abstract

Timeliness of therapy initiation is a fundamental determinant of outcomes for many medical conditions, most importantly, cancer. Yet, existing inefficiencies in healthcare systems mean that delays between diagnosis and treatment frequently adversely affect the clinical outcome for cancer patients. Although estimates of effects of lag time to therapy would be informative to policymakers considering resource allocation to minimize delays in oncology, causal methods are seldom explicitly discussed in epidemiologic analyses of these lag times. Here, we propose causal estimands for such studies, and outline the protocol of a target trial that could be emulated with observational data on lag times. To illustrate the application of this approach, we simulate studies of lag time to treatment under two scenarios: one in which indication bias (Waiting Time Paradox) is present and another in which it is absent. Although our discussion focuses on oncologic outcomes, components of the proposed target trial could be adapted to study delays for other medical conditions. We believe that the clarity with which causal questions are posed under the target trial emulation framework would lead to improved quantification of the effects of lag times in oncology, and hence to better informed policy decisions.

4

A Machine Learning Based Causal Interface for Time-Varying Environmental Predictors of Substance Use Initiation in the ABCD Study

Wei, M.; Yadlapati, L.; Peng, Q.

2026-04-17 addiction medicine 10.64898/2026.04.15.26350988 medRxiv

Top 0.1%

3.9%

Show abstract

Background: The Adolescent Brain Cognitive Development (ABCD) Study provides rich longitudinal data on environmental, genetic, and behavioral factors related to substance use initiation. Classical marginal structural models (MSMs) require selecting covariates for propensity models, which is challenging when there are many correlated predictors. Methods: We analyzed longitudinal panel data from 11,868 ABCD participants with repeated observations over time. Interval-level binary outcomes were defined for initiation of alcohol, nicotine, cannabis, and any substance, including only participants at risk before initiation. All predictors were constructed as lagged variables to preserve temporal ordering. We used a two-stage machine learning-based causal framework. First, we performed graph discovery using a Granger-inspired lagged predictive modeling approach with elastic-net logistic regression to identify relationships between past predictors and future outcomes. Stable candidate edges were selected using subject-level bootstrap stability selection. Second, we estimated adjusted effects for stable predictors using double machine learning (DML) with partialling-out and cross-fitting. For each predictor, the lagged variable was treated as the exposure and adjusted for high-dimensional lagged covariates. Cross-fitting with group-based splitting accounted for within-subject dependence. Nuisance functions were estimated using random forests, and cluster-robust standard errors were used for inference. Results: We identified stable predictors across multiple domains, including sleep patterns, family environment, peer relationships, behavioral traits, and genetic risk. Many predictors were shared across substance outcomes, while some were outcome-specific. Effect sizes were modest, typically ranging from -0.01 to 0.02 per standard deviation increase in the predictor. Both risk-increasing and protective associations were observed. Risk factors included sleep disturbance and behavioral risk indicators, while protective factors included parental monitoring and structured environments. Conclusions: This study presents a practical framework for analyzing high-dimensional longitudinal data and identifying time-varying predictors of substance use initiation. The approach combines machine learning for variable selection with causal inference for effect estimation. The results highlight both shared and outcome-specific risk factors and identify modifiable targets, such as family environment and sleep, that may inform prevention strategies.

5

The Rayleigh Quotient and Contrastive Principal Component Analysis II

Jackson, K. C.; Carilli, M. T.; Pachter, L.

2026-04-10 bioinformatics 10.64898/2026.04.08.717236 medRxiv

Top 0.1%

3.7%

Show abstract

Contrastive principal component analysis (PCA) methods are effective approaches to dimensionality reduction where variance of a target dataset is maximized while variance of a background dataset is minimized. We previously described how contrastive PCA problems can be written as solutions to generalized eigenvalue problems that maximize particular instantiations of the Rayleigh quotient. Here, we discuss two extensions of contrastive PCA: we use kernel weighting from spatial PCA (k-{rho}PCA) to contrast spatial and non-spatial axes of variation, and separately solve the Rayleigh quotient in the space of basis function coefficients (f-{rho}PCA) to find modes of variation in functional data. Together, these extensions expand the scope of contrastive PCA while unifying disparate fields of spatial and functional methods within a single conceptual and mathematical framework. We showcase the utility of these extensions with several examples drawn from genomics, analyzing gene expression in cancer and immune response to vaccination.

6

Dynamic and Baseline Multi-Task Learning for Predicting Substance Use Initiation in the ABCD Study

Wei, M.; Zhang, H.; Peng, Q.

2026-04-13 addiction medicine 10.64898/2026.04.10.26350655 medRxiv

Top 0.1%

3.6%

Show abstract

Background: Early initiation of substance use is linked to later adverse outcomes, and risk factors come from multiple domains and are shared across substances. In our previous work, traditional time-to-event Cox models identified individual risk factors, but these models are not designed to jointly model multiple outcomes or capture complex non-linear relationships. Multi-task learning (MTL) can leverage shared structure across related outcomes to improve prediction and distinguish common versus substance-specific predictors. However, most MTL studies rely on baseline features and focus on single outcomes, which limits their ability to capture shared risk and temporal changes. Substance use initiation is a time-dependent process that unfolds during development and reflects changing exposures over time. Baseline-only models cannot capture these changes or represent risk dynamics. Discrete-time modeling provides a practical approach by estimating interval-level initiation risk and combining it into cumulative risk at the subject level. By integrating multi-task learning with dynamic modeling, it is possible to share information across outcomes while capturing how risk evolves over time, which may improve prediction performance. Methods: Using the Adolescent Brain Cognitive Development (ABCD) Study (release 5.1), we developed two complementary multi-task learning (MTL) frameworks to predict initiation of alcohol, nicotine, cannabis, and any substance use. A baseline MTL model predicted fixed- horizon (48-month) initiation using one record per participant, while a dynamic discrete-time MTL model incorporated longitudinal interval data to model time-varying risk. Both models used multi-domain environmental exposures, core covariates, and polygenic risk scores (PRS). Performance was evaluated on a held-out test set using AUROC, PR-AUC, and calibration metrics, and compared with single-task logistic regression (LR). Feature importance was assessed using permutation importance and compared with Cox proportional hazards models. Results: MTL showed comparable or improved performance relative to LR, with larger gains for low-prevalence outcomes (cannabis and nicotine). Incorporating longitudinal information led to consistent improvements across all outcomes. Dynamic models increased AUROC by +0.044 to +0.062 for MTL and +0.050 to +0.084 for LR, indicating that temporal information was the primary driver of performance gains. Feature importance analyses showed modest overlap across methods, with higher agreement between dynamic MTL and Cox models than static MTL. A small set of features, including externalizing behavior, parental monitoring, and developmental factors, were consistently identified across all approaches. Conclusions: Dynamic multi-task learning improves the prediction of substance use initiation by leveraging longitudinal structure and shared information across outcomes. While MTL provides additional gains, incorporating time-varying information is the dominant factor for improving performance. Combining baseline and dynamic frameworks offers a comprehensive strategy for identifying robust risk factors and modeling adolescent substance use initiation.

7

Mediation analysis in longitudinal data: an unbiased estimator for cumulative indirect effect

Li, Y.; Cabral, H.; Tripodis, Y.; Ma, J.; Levy, D.; Joehanes, R.; Liu, C.; Lee, J.

2026-04-20 epidemiology 10.64898/2026.04.18.26351189 medRxiv

Top 0.1%

3.3%

Show abstract

Mediation analysis quantifies how an exposure affects an outcome through an intermediate variable. We extend mediation analysis to capture the cumulative effects of longitudinal predictors on longitudinal outcomes. Our proposed model examines how mediators transmit the effects of the current and previous exposure on the current outcome. We construct a least-squared estimator for cumulative indirect effect (CIE) and used three approaches (exact form, delta method, and bootstrap procedure) to estimate its standard error (SE). The estimator of CIE is unbiased with no unmeasured confounding and independent model errors between mediator model and outcome model at all time points, as shown in statistical inference and in simulations. While three SE estimates are numerically similar, bootstrap procedure is recommended due to its simplicity in implementation. We apply this method to Framingham Heart Study offspring cohort to assess if DNA methylation mediates the association of alcohol consumption with systolic blood pressure over two time points. We identify two CpGs (cg05130679 and cg05465916) as mediators and construct a composite DNA methylation score from 11 CpGs, which mediates for 39% of the cumulative effect. In conclusion, we propose an unbiased estimator for CIE. Future studies will investigate the missingness in mediators and outcomes.

8

Generative AI-assisted Bayesian-frequentist Hybrid Inference in Single-cell RNA Sequencing Analysis for Genes Associated with Alzheimer's Disease

Han, G.; Yuan, A.; Oware, K. D.; Wright, F.; Carroll, R. J.; Smith, M.; Ory, M. G.; Yan, D.; Wang, W.; Sun, Z.; Dai, Q.; Allen, C.; Dang, A.; Liu, Y.

2026-04-20 geriatric medicine 10.64898/2026.04.17.26351142 medRxiv

Top 0.1%

2.1%

Show abstract

Alzheimers disease genomics and other high-dimensional omics studies demand powerful statistical methods, yet Bayesian inference remains underutilized despite its advantages in small-sample settings, owing to the prohibitive cost of eliciting reliable priors across thousands or millions of parameters. We propose an AI-assisted Bayesian-frequentist hybrid inference framework that couples large language model based prior elicitation with the hybrid inference theory of Yuan (2009). ChatGPT-4o is queried via a standardized prompt to assess the strength of evidence linking each gene to a disease of interest, and the response is mapped to an informative normal prior via a standardized effect-size calibration. Parameters for covariates of secondary interest are treated as frequentist parameters, preserving efficiency and avoiding sensitivity to mis-specified priors. We derive closed-form hybrid estimators under uniform and conjugate normal priors in linear models, establish their asymptotic equivalence to the frequentist and full Bayes estimators, and show in simulations that hybrid inference using unconditional variance estimation leads to high statistical power while accurately controlling the Type I error rate. Applied to single-cell RNA sequencing data from the ROSMAP cohort for Alzheimers disease as an example, the framework identifies biologically coherent pathways (such as gamma-secretase pathways) previously undetected. The proposed framework offers a principled and computationally scalable approach to genome-wide Bayesian analysis, with potential for broad application across omics platforms and disease settings.

9

Widespread genetic effect heterogeneity impacts bias and power in nonlinear Mendelian randomization

Wang, J.; Morrison, J.

2026-04-20 epidemiology 10.64898/2026.04.17.26351133 medRxiv

Top 0.2%

1.7%

Show abstract

1Mendelian randomization (MR) uses genetic variants as instrumental variables to infer causal relationships between complex traits. Standard MR can be used to estimate an average causal effect at the population level, and typically assumes a linear exposure-outcome relationship. Recently, several methods for estimating nonlinear effects have been developed. However, many have been found to produce spurious empirical findings when subjected to negative control analyses. We propose that this poor performance may be attributable to heterogeneity in variant-exposure associations. We demonstrate that heterogeneous genetic effects on exposure lead to biased estimates, poor coverage, and inflated type I error in control function and stratification-based methods. In contrast, two-stage least squares (TSLS) methods are robust to such heterogeneity, but suffer from low precision and low power in some circumstances. We show that a statistical test for heterogeneity can be used to guide the choice of nonlinear MR methods. Using UK Biobank data, we reassess the causal effects of BMI, vitamin D, and alcohol consumption on blood pressure, lipid, C-reactive protein, and age (negative control). We find strong evidence of heterogeneity for all three exposures, and also recapitulate previous results that control function and stratification-based methods are prone to false positives. Finally, using nonparametric TSLS, we identify evidence of nonlinear causal effects of BMI on HDL cholesterol, triglycerides, and C-reactive protein; however, specific estimates of the shape of these relationships are imprecise. Altogether, our results suggest that common nonlinear MR methods are unreliable in the presence of realistic levels of heterogeneity, and that more methodological development is required before practically useful nonlinear MR is feasible.

10

An Empirical Assessment of Inferential Reproducibility of Linear Regression in Health and Biomedical Research Papers

Jones, L.; Barnett, A.; Hartel, G.; Vagenas, D.

2026-04-07 health systems and quality improvement 10.64898/2026.04.07.26350296 medRxiv

Top 0.2%

1.7%

Show abstract

Background: In health research, variability in modelling decisions can lead to different conclusions even when the same data are analysed, a challenge known as inferential reproducibility. In linear regression analyses, incorrect handling of key assumptions, such as normality of the residuals and linearity, can undermine reproducibility. This study examines how violations of these assumptions influence inferential conclusions when the same data are reanalysed. Methods: We randomly sampled 95 health-related PLOS ONE papers from 2019 that reported linear regression in their methods. Data were available for 43 papers, and 20 were assessed for computational reproducibility, with three models per paper evaluated. The 14 papers that included a model at least partially computationally reproduced were then examined for inferential reproducibility. To assess the impact of assumption violations, differences in coefficients, 95% confidence intervals, and model fit were compared. Results: Of the fourteen papers assessed, only three were inferentially reproducible. The most frequently violated assumptions were normality and independence, each occurring in eight papers. Violations of independence were particularly consequential and were commonly associated with inferential failure. Although reproduced analyses often retained the same binary statistical significance classification as the original studies, confidence intervals were frequently wider, indicating greater uncertainty and reduced precision. Such uncertainty may affect the interpretation of results and, in turn, influence treatment decisions and clinical practice. Conclusion: Our findings demonstrate that substantial violations of key modelling assumptions often went undetected by authors and peer reviewers and, in many cases, were associated with inferential reproducibility failure. This highlights the need for stronger statistical education and greater transparency in modelling decisions. Rather than applying rigid or misinformed rules, such as incorrectly testing the normality of the outcome variable, researchers should adopt modelling frameworks guided by the research question and the study design. When assumptions are violated, appropriate alternatives, such as robust methods, bootstrapping, generalized linear models, or mixed-effects models, should be considered. Given that assumption violations were common even in relatively simple regression models, early and sustained collaboration with statisticians is critical for supporting robust, defensible, and clinically meaningful conclusions.

11

Benchmarking precision matrix estimation methods for differential co-expression network analysis

Overmann, M.; Grabert, G.; Kacprowski, T.

2026-04-15 bioinformatics 10.64898/2026.04.13.716081 medRxiv

Top 0.2%

1.3%

Show abstract

BackgroundGene expression profiling is widely used to investigate disease mechanisms, but classical approaches such as differential expression or pairwise correlation analyses provide limited interpretability. Network-based differential co-expression methods that model conditional dependencies through partial correlations offer richer insights, yet their application in high-dimensional settings requires estimation of precision matrices. Numerous precision matrix estimation methods (PMEMs) have been proposed, but their relative performance under various conditions remains unclear. ResultsSimulated gene expression datasets with known ground truth correlation structures were used to benchmark a broad set of PMEMs. Performance was strongly affected by data characteristics, including covariance structure, matrix density, covariance values, sample size-to-dimension ratio, and sampling distribution. Among the evaluated methods, GLassoElnetFast consistently showed the highest accuracy in recovering differential edges, although high signal-to-noise ratios and sufficient sample sizes remain essential for reliable inference. ConclusionsEvaluation across diverse simulation conditions demonstrated that no single metric or condition was sufficient to assess PMEM performance. Therefore, previous less extensive evaluations risked misleading conclusions. Our simulation and benchmarking framework supports future method development and ensures reproducible evaluation of newly developed approaches.

12

HHBayes: A Flexible Bayesian Framework for Simulating and Analyzing Household Transmission Dynamics

Li, K.; Hou, Y.; Mukherjee, B.; Pitzer, V. E.; Weinberger, D. M.

2026-04-03 infectious diseases 10.64898/2026.04.01.26349903 medRxiv

Top 0.2%

1.3%

Show abstract

Household transmission studies are important for understanding infectious disease transmission and evaluating interventions; however, they are frequently constrained by methodological challenges, including in study design and sample size determination, and in estimating parameters of interest after collecting the data. Existing tools often lack flexibility in modeling age-specific susceptibility, infectivity patterns, and the impact of interventions such as vaccination or prophylaxis. Here, we develop HHBayes, an open-source R package that provides a unified framework for simulating and analyzing household transmission data using Bayesian methods. The package enables researchers to: (1) simulate realistic household transmission dynamics with highly customizable variables; (2) incorporate viral load data (measured in viral copies/mL or cycle threshold values) to model time-varying infectiousness; (3) estimate age-dependent susceptibility and infectivity parameters using Hamiltonian Monte Carlo methods implemented in Stan; and (4) evaluate intervention effects through user-defined covariates that modify susceptibility or infectivity. We demonstrate the capabilities of the package through simulation studies showing accurate parameter recovery and applications to seasonal respiratory virus transmission, including the impact of vaccination and antiviral prophylaxis on household attack rates. HHBayes addresses a critical gap in infectious disease epidemiology by providing researchers with accessible tools for both prospective study design and retrospective data analysis. The flexibility of the package in handling complex household structures, time-varying infectiousness, and intervention effects makes it valuable for studying diverse pathogens.

13

Robustly Quantifying Uncertainty in International Avian Influenza A(H5N1) Infection Fatality Ratios

Gada, L.; Afuleni, M. K.; Noble, M.; House, T.; Finnie, T.

2026-04-23 public and global health 10.64898/2026.04.22.26351373 medRxiv

Top 0.2%

1.2%

Show abstract

Knowing the mortality rates associated with infection by a pathogen is essential for effective preparedness and response. Here, harnessing the flexibility of a Bayesian approach, we produce an estimate of the Infection Fatality Ratio (IFR) for A(H5N1) conditional on explicit assumptions, and quantify the uncertainty thereof. We also apply the method to first-wave COVID-19 data up to March 2020, demonstrating the estimates that could be obtained were the model available then. Our analysis uses World Development Indicators (WDI) from the World Bank, the A(H5N1) WHO confirmed cases and deaths tracker by country (2003-2024), and COVID-19 cases and deaths data from John Hopkins University (January and February 2020). Since infectious disease dynamics are typically influenced by local socio-economic factors rather than political borders, individual countries are placed within clusters of countries sharing similar WDIs relevant to respiratory viral diseases, with clusters derived by performing Hierarchical Clustering. To estimate the IFR, we fit a Negative Binomial Bayesian Hierarchical Model for A(H5N1) and COVID-19 separately. We explicitly modelled key unobserved parameters with informative priors from expert opinion and literature. By modelling underreporting, our analysis suggests lower fatality (15.3%) compared to WHO's Case Fatality Ratio estimate (54%) on lab-confirmed cases. However, credible intervals are wide ([0.5%, 64.2%] 95% CrI). Therefore, good preparedness for a potential A(H5N1) pandemic implies adopting scenario planning under our central estimate, as well as for IFRs as high as 70%. Our approach also returns a COVID-19 IFR estimate of 2.8% with [2.5%, 3.1%] 95% CrI which is consistent with literature.

14

Testing and Estimating Causal Treatment Effect Heterogeneity in Observational Studies via Revised Deep Semiparametric Regression: A Lung Transplant Case Study

Yuan, S.; Zou, F.; Zou, B.

2026-04-15 bioinformatics 10.64898/2026.04.13.718254 medRxiv

Top 0.3%

0.9%

Show abstract

Lung transplantation programs must decide when bilateral lung transplantation (BLT) offers meaningful functional benefit over single lung transplantation (SLT). Because donor and recipient characteristics jointly shape outcomes, the BLT-SLT contrast may differ across patients. However, analyzing observational registries poses a statistical challenge: apparent subgroup differences can be artifacts of complex confounding, while true heterogeneity can be missed or poorly quantified. Using a large national registry, we investigate whether the BLT effect varies across recipients and identify clinically relevant profiles of benefit using post-transplant lung function measured by forced expiratory volume in 1 second (FEV1). We develop deepHTL, a framework that tests for treatment effect heterogeneity and estimates how the BLT-SLT effect varies with patient features. In extensive simulations designed to resemble registry-like confounding, deepHTL controls false positives for detecting heterogeneity and yields more accurate individualized effect estimates than common machine learning methods. In the lung transplant cohort, we find strong evidence of heterogeneity in the BLT-SLT effect on FEV1: younger, lower risk recipients with better baseline status show the largest FEV1 gains from BLT, whereas older, higher risk candidates exhibit diminished marginal benefit. These findings provide statistically grounded guidance for patient selection and allocation of scarce donor organs.

15

Interpretability as stability under perturbation reveals systematic inconsistencies in feature attribution

Piorkowska, N. J.; Olejnik, A.; Ostromecki, A.; Kuliczkowski, W.; Mysiak, A.; Bil-Lula, I.

2026-04-22 health informatics 10.64898/2026.04.20.26351354 medRxiv

Top 0.3%

0.8%

Show abstract

Interpreting machine learning models typically relies on feature attribution methods that quantify the contribution of individual variables to model predictions. However, it remains unclear whether attribution magnitude reflects the true functional importance of features for model performance. Here, we present a unified interpretability framework integrating permutation-based attribution, feature ablation, and stability under perturbation across multiple feature spaces. Using nested cross-validation and permutation-based null diagnostics, we systematically evaluate the relationship between attribution magnitude and functional dependence in clinical and biomarker-based prediction models. Attribution magnitude is frequently misaligned with functional importance, with weak to strong negative correlations observed across feature spaces (Spearman {rho} ranging from -0.374 to -0.917). Features with high attribution often have limited impact on model performance when removed, whereas features with low attribution can be essential for maintaining predictive accuracy. These discrepancies define distinct classes of interpretability failure, including attribution excess and latent dependence. Interpretability further depends on feature space composition, and stable, functionally relevant features are not necessarily those with the highest attribution scores. By integrating attribution, functional impact, and stability into a composite Feature Reliability Score, we identify features that remain informative across perturbations and analytical contexts. These findings indicate that interpretability does not arise from attribution magnitude alone but is better characterized from stability under perturbation. This framework provides a basis for more robust model interpretation and highlights limitations of attribution-centric approaches in high-dimensional and correlated data settings.

16

Covariate adjustment for hierarchical outcomes and the win ratio: how to do it and is it worthwhile?

Hazewinkel, A.-D.; Gregson, J.; Bartlett, J. W.; Gasparyan, S. B.; Wright, D.; Pocock, S.

2026-03-31 cardiovascular medicine 10.64898/2026.03.30.26347966 medRxiv

Top 0.4%

0.7%

Show abstract

Objectives: Introducing a new covariate adjustment method for hierarchical outcomes using ordinal logistic regression, comparing it with existing approaches, and assessing whether adjustment improves power in randomized trials with hierarchical outcomes. Methods: We developed an ordinal regression-based method for covariate adjustment of the win ratio and compared it with three alternatives: probability index models, inverse probability weighting, and a randomization-based estimator. Methods were applied to the EMPEROR-Preserved rial and tested through extensive simulations involving two common hierarchical outcome structures: time-to-event composites, and composites combining time-to-event with quantitative measures. Simulations assessed impacts on estimates, standard errors, and power across prognostic and non-prognostic settings. Results: In RCT data and simulations, covariate adjustment consistently increased power when adjusting for prognostic baseline variables. Gains were comparable to or greater than those in conventional Cox models, with no power loss for non-prognostic covariates. Our ordinal approach performed similarly to existing methods while providing interpretable covariate effect estimates. Adjusting for baseline values of quantitative components yielded power gains according to the baseline-to-follow-up correlation. Conclusions: Covariate adjustment for prognostic variables meaningfully improves efficiency in win ratio analyses for hierarchical outcomes. Our ordinal method is easily implemented and facilitates covariate effect interpretation. We recommend the broader adoption of covariate adjustment and our ordinal method in randomized trials using hierarchical outcomes.

17

Dual Nanoparticle-Driven Therapeutics for Leishmaniasis: A Mathematical Model of Targeted Macrophage and Parasite Elimination

Arumugam, D.; Ghosh, M.

2026-03-30 immunology 10.64898/2026.03.27.714640 medRxiv

Top 0.5%

0.5%

Show abstract

BackgroundTo control leishmaniasis, chemotherapy drugs are currently under development. However, these drugs often exhibit poor efficacy and are associated with toxicity, adverse effects, and drug resistance. At present, no specific drug is available for the treatment of leishmaniasis. Meanwhile, vaccine research is ongoing. Recent studies have analysed some experimental vaccines using mathematical models. AimIn previous work, drug targeting was focused on the entire human body rather than specifically addressing infected macrophages and parasites. In our current approach, we aim to eliminate infected macrophages and parasites through nano-drug design. Specifically, we utilise two types of nanoparticles: iron oxide and citric acid-coated iron oxide. Moving forward, we plan to advance this strategy using mathematical modelling of macrophage-parasite interactions. MethodsWe design PDE-based models of macrophages and parasites, incorporating cytokine dynamics, to support nano-drug development. Drug efficacy is estimated using posterior distributions to analyse phenotypic fluctuations of macrophages and parasites during the design phase. We investigate implicit and semi-implicit treatment schemes, focusing on energy decay properties. To model drug flow during treatment, we introduce a three-phase moving boundary problem. Comparative analyses are conducted to evaluate macrophage and parasite behaviour with and without treatment. Finally, the entire framework is implemented within a virtual lab environment. ResultsThe results show that the nano-drug exhibits better efficacy compared to combined drug doses. We analysed and compared two types of nano-drug particles: iron oxide and citric acid-coated iron oxide. We discuss how the drug effectively targets and eliminates infected macrophages and parasites. ConclusionOur models results and simulations will support researchers conducting further studies in nano-drug design for leishmaniasis. These simulations are performed within a virtual lab environment.

18

Demystifying Clone-Censor-Weight Method in Target Trial Emulation: A Real-World Study of HPV Vaccination Strategies

Lin, T.; Li, Y.; Huang, Z.; Gui, T. T.; Wang, W.; Guo, Y.

2026-04-22 health informatics 10.64898/2026.04.21.26351413 medRxiv

Top 0.5%

0.5%

Show abstract

Target trial emulation (TTE) offers a principled way to estimate treatment effects using real-world observational data, but analyses of time-varying treatment strategies remain vulnerable to immortal time bias. The clone-censor-weight (CCW) approach is increasingly used to address this problem, yet key aspects of its causal interpretation and implementation remain unclear. In this work, we emulate a target trial using electronic health records (EHRs) to compare completion of a 3-dose 9-valent human papillomavirus vaccination (HPV) series within 12 months versus remaining partially vaccinated among vaccine initiators. We link CCW to the classic potential outcome framework in causal inference, evaluate the role of different weighting mechanisms, and account for within-subject correlation induced by cloning using cluster-robust variance estimation. Our study provides practical guidance for applying CCW in real-world comparative effectiveness studies to address immortal time bias and supports more rigorous and interpretable treatment effect estimation in TTE.

19

Interpretable Machine Learning for Population-Level Severe Tooth Loss Prediction: A Two-Axis External Validation

LAM, Q. T.; Fan, F.-Y.; Wang, Y.-L.; Wu, C.-Y.; Sun, Y.-S.; Vo, T. T. T.; Kuo, H.; Kha, Q. H.; Le, M. H. N.; Vu, G.; Le, N. Q. K.; Lee, I.-T.

2026-04-05 dentistry and oral medicine 10.64898/2026.04.03.26350106 medRxiv

Top 0.5%

0.5%

Show abstract

Objectives: Machine learning can predict severe tooth loss (STL, 6 or more missing teeth), but opaque black-box models neglecting complex survey designs limit clinical adoption. This study developed and externally validated an intrinsically interpretable, survey-weighted framework for population-level STL prediction, capturing complex socio-behavioral and systemic health determinants. Methods: We analyzed nationally representative data from BRFSS 2022 (derivation, N=433,772), BRFSS 2024 (temporal validation, N=448,213), and the clinically examined NHANES 2015-2018 (cross-domain validation, N=10,775). Missing data were resolved using an anti-leakage HistGradientBoosting MICE pipeline, preserving multivariate epidemiological variance. An Explainable Boosting Machine (EBM, GA2M) was natively trained by integrating complex survey weights. For external clinical validation, structural domain shift was addressed through non-parametric Isotonic Regression recalibration. Results: The EBM achieved strong temporal stability on BRFSS 2024 (AUC: 0.8627; Brier Score: 0.0845). Upon cross-domain validation against NHANES 2015-2018, the calibrated model demonstrated robust transportability (AUC: 0.7504; Brier Score: 0.1358). Notably, the zero-shot EBM (AUC: 0.7591) closely matched the predictive ceiling of a black-box stacked meta-ensemble (AUC: 0.7706), eliminating the need for unstable post-hoc approximations. Fully auditable shape functions explicitly revealed non-linear risk thresholds and synergistic pairwise interactions for key predictors including age, smoking, income, and diabetes. Decision curve analysis confirmed substantial positive net clinical benefit across a 5%-50% risk threshold continuum. Conclusions: The MICE-EBM framework predicts STL with complete intrinsic transparency and robust probabilistic reliability. By successfully generalizing across unobserved temporal and clinical cohorts, this TRIPOD+AI compliant framework provides a clinically deployable tool to optimize targeted dental public health interventions.

20

A formula for the basic reproduction number of an infectious disease in a heterogeneous population with structured mixing

Colman, E.; Chatzilena, A.; Prasse, B.; Danon, L.; Brooks Pollock, E.

2026-03-30 epidemiology 10.64898/2026.03.27.26349419 medRxiv

Top 0.5%

0.5%

Show abstract

The basic reproduction number of an infectious disease is known to depend on the structure of contacts between individuals in a population. This relationship has been explored mathematically through two well-known models: one which depends on a matrix of contact rates between different demographic groups, and another which depends on the variability of contact rates over the population. Here we introduce a model that combines and generalises these two approaches. We derive a formula for the basic reproduction number and validate it through comparisons to simulated outbreaks. Applying this method to contact survey data collected in Belgium between 2020 and 2022, we find that our model produces higher estimates of the basic reproduction number and larger relative changes over periods when social contact behaviour was changing during the COVID-19 pandemic. Our analysis suggests some practical considerations when using contact data in models of infectious disease transmission.